Design and Evaluation of Inflectional Stemmer for Bulgarian
نویسنده
چکیده
The paper starts with an overview of some important approaches to stemming for English and other languages. Then, the design, implementation and evaluation of the BulStem inflectional stemmer for Bulgarian are presented. The problem is addressed from a machinelearning perspective using a large morphological dictionary. A detailed automatic evaluation in terms of understemming, over-stemming and coverage is provided. In addition, the effect of stemming and BulStem parameters setting is demonstrated on a particular task: text categorisation using kNN+LSA.
منابع مشابه
BulStem: Design and Evaluation of Inflectional Stemmer for Bulgarian
The paper starts with an overview of some important approaches to stemming for English and other languages. Then, the design, implementation and evaluation of BulStem – a freely available inflectional stemmer for Bulgarian, are presented. The problem is addressed from a machine-learning perspective using a large morphological dictionary. A detailed automatic evaluation in terms of under-stemmin...
متن کاملStemming Approaches for East European Languages
During this CLEF evaluation campaign, the first objective is to propose and evaluate various indexing and search strategies for the Czech language that will hopefully result in more effective retrieval than language-independent approaches (n-gram). Based on the stemming strategy we developed for other languages, we propose that for the Slavic language a light stemmer (inflectional only) and als...
متن کاملHungarian and Czech Stemming using YASS
This is the second year in a row we are participating in CLEF. Our aim is to test the performance of a statistical stemmer on various languages. Last year, we tried the stemmer on French; this year, we opted for Hungarian, Bulgarian and Czech. We were unable to complete the Bulgarian task, but submitted official runs for the adhoc monolingual Hungarian and Czech tasks. We find that, for both la...
متن کاملBulgarian Inflectional Morphology in Universal Networking Language
The paper presents a web-based application of semantic networks to model Bulgarian inflectional morphology. It demonstrates the general ideas, principles, and problems of inflectional grammar knowledge representation used for encoding Bulgarian inflectional morphology in Universal Networking Language (UNL). The analysis of UNL formalism is outlined in terms of its expressive power to present in...
متن کاملTo stem or lemmatize a highly inflectional language in a probabilistic IR environment?
Effects of three different morphological methods-lemmatization, stemming and inflectional stem generation-for Finnish are compared in a probabilistic IR environment (INQUERY). Evaluation is done using a four point relevance scale which is partitioned differently in different test settings. Results show that inflectional stem generation which has not been used much in IR, compares well with lemm...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998